PySpark Data Engineer

Long Finch Technologies

Irving, TX/ Jacksonville, FL/ Jersey City, NJ

Posted On: Jan 03, 2025

Posted On: Jan 03, 2025

Job Overview

Job Type

Full-time

Experience

9 - 12 Years

Salary

Depends on Experience

Work Arrangement

On-Site

Travel Requirement

0%

Required Skills

  • Hadoop
  • PySpark
  • ETL
  • SQL
  • Python
Job Description
Key Responsibilities
  • Ability to design, build, and unit test applications on Spark framework on Python.
  • Build PySpark-based applications for both batch and streaming requirements, which will require in-depth knowledge on majority of Hadoop and NoSQL databases as well.
  • Develop and execute data pipeline testing processes and validate business rules and policies
  • Optimize performance of the built Spark applications in Hadoop using configurations around Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Optimize performance for data access requirements by choosing the appropriate native Hadoop file formats (Avro, Parquet, ORC etc) and compression codec respectively.
  • Ability to design & build real-time applications using Apache Kafka & Spark Streaming
  • Build integrated solutions leveraging Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec.
  • Build data tokenization libraries and integrate with Hive & Spark for column-level obfuscation
  • Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources.
  • Create and maintain integration and regression testing framework on Jenkins integrated with BitBucket and/or GIT repositories
  • Participate in the agile development process, and document and communicate issues and bugs relative to data standards in scrum meetings
  • Work collaboratively with onsite and offshore teams.
  • Develop & review technical documentation for artifacts delivered.
  • Ability to solve complex data-driven scenarios and triage towards defects and production issues
  • Ability to learn-unlearn-relearn concepts with an open and analytical mindset
  • Participate in code release and production deployment.
  • Challenge and inspire team members to achieve business results in a fast-paced and quickly changing environment.
Required Experience/Skills
  • 10 + Years  Overall Experience in Data Management, Data Lake and Data Warehouse
  • 6+ Years Hadoop, Hive, Sqoop, SQL, Teradata
  • 6+ Years PySpark(Python and Spark), Unix
  • Good to have Industry ETL experience
  • Banking Domain experience 

Job ID: LF250001


Posted By

Mayank Rawat